Quit Emailing Yourself

# diffusion → acceleration → language models

1 link tagged with all of: diffusion + acceleration + language models

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

The article presents Fast-dLLM, a method for accelerating diffusion-based large language models (LLMs) by implementing a novel block-wise Key-Value (KV) Cache and a confidence-aware parallel decoding strategy. This approach significantly improves inference speed, achieving up to 27.6 times throughput enhancement with minimal accuracy loss, thereby making diffusion LLMs competitive with autoregressive models. The findings demonstrate the potential for practical deployment in real-world applications.

Saved by hn_user_5 · Last saved October 27, 2025 · 3 min read

diffusion ✓ language models ✓ acceleration ✓

Links

[2505.22618] Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding